MersV1, Main, Exploration, bibRecord, 002006

Efficient algorithms for biological stems search

Identifieur interne : 002006 ( Main/Exploration ); précédent : 002005; suivant : 002007

Efficient algorithms for biological stems search

Auteurs : Tian Mi [États-Unis] ; Sanguthevar Rajasekaran [États-Unis]

Source :

BMC Bioinformatics [ 1471-2105 ] ; 2013.

RBID : PMC:3679804

Descripteurs français

KwdFr :
- ADN (), ARN (), Algorithmes, Analyse de séquence d'ADN, Analyse de séquence de protéine, Motifs d'acides aminés, Motifs nucléotidiques.
MESH :
- ADN, ARN, Algorithmes, Analyse de séquence d'ADN, Analyse de séquence de protéine, Motifs d'acides aminés, Motifs nucléotidiques.

English descriptors

KwdEn :
- Algorithms, Amino Acid Motifs, DNA (chemistry), Nucleotide Motifs, RNA (chemistry), Sequence Analysis, DNA, Sequence Analysis, Protein.
MESH :
- chemical , chemistry : DNA, RNA.
- Algorithms, Amino Acid Motifs, Nucleotide Motifs, Sequence Analysis, DNA, Sequence Analysis, Protein.

Abstract

Background

Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in the literature. One such version is called the Planted Motif Search (PMS)or (l, d)-motif Search. PMS is known to be NP complete. The time complexities of most of the planted motif search algorithms depend exponentially on the alphabet size. Recently a new version of the motif search problem has been introduced by Kuksa and Pavlovic. We call this version as the Motif Stems Search (MSS) problem. A motif stem is an l-mer (for some relevant value of l)with some wildcard characters and hence corresponds to a set of l-mers (without wildcards), some of which are (l, d)-motifs. Kuksa and Pavlovic have presented an efficient algorithm to find motif stems for inputs from large alphabets. Ideally, the number of stems output should be as small as possible since the stems form a superset of the motifs.

Results

In this paper we propose an efficient algorithm for MSS and evaluate it on both synthetic and real data. This evaluation reveals that our algorithm is much faster than Kuksa and Pavlovic’s algorithm.

Conclusions

Our MSS algorithm outperforms the algorithm of Kuksa and Pavlovic in terms of the run time as well as the number of stems output. Specifically, the stems output by our algorithm form a proper (and much smaller)subset of the stems output by Kuksa and Pavlovic’s algorithm.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679804

DOI: 10.1186/1471-2105-14-161
PubMed: 23679045
PubMed Central: 3679804

Affiliations:

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Efficient algorithms for biological stems search</title>
<author><name sortKey="Mi, Tian" sort="Mi, Tian" uniqKey="Mi T" first="Tian" last="Mi">Tian Mi</name>
<affiliation wicri:level="2"><nlm:aff id="I1">Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation wicri:level="2"><nlm:aff id="I1">Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">23679045</idno>
<idno type="pmc">3679804</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679804</idno>
<idno type="RBID">PMC:3679804</idno>
<idno type="doi">10.1186/1471-2105-14-161</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">000946</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000946</idno>
<idno type="wicri:Area/Pmc/Curation">000946</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000946</idno>
<idno type="wicri:Area/Pmc/Checkpoint">001233</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">001233</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:23679045</idno>
<idno type="wicri:Area/PubMed/Corpus">001C92</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001C92</idno>
<idno type="wicri:Area/PubMed/Curation">001C92</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001C92</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001B58</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001B58</idno>
<idno type="wicri:Area/Ncbi/Merge">000A49</idno>
<idno type="wicri:Area/Ncbi/Curation">000A49</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000A49</idno>
<idno type="wicri:Area/Main/Merge">002027</idno>
<idno type="wicri:Area/Main/Curation">002006</idno>
<idno type="wicri:Area/Main/Exploration">002006</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Efficient algorithms for biological stems search</title>
<author><name sortKey="Mi, Tian" sort="Mi, Tian" uniqKey="Mi T" first="Tian" last="Mi">Tian Mi</name>
<affiliation wicri:level="2"><nlm:aff id="I1">Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation wicri:level="2"><nlm:aff id="I1">Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Amino Acid Motifs</term>
<term>DNA (chemistry)</term>
<term>Nucleotide Motifs</term>
<term>RNA (chemistry)</term>
<term>Sequence Analysis, DNA</term>
<term>Sequence Analysis, Protein</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>ADN ()</term>
<term>ARN ()</term>
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Analyse de séquence de protéine</term>
<term>Motifs d'acides aminés</term>
<term>Motifs nucléotidiques</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="chemistry" xml:lang="en"><term>DNA</term>
<term>RNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Amino Acid Motifs</term>
<term>Nucleotide Motifs</term>
<term>Sequence Analysis, DNA</term>
<term>Sequence Analysis, Protein</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>ADN</term>
<term>ARN</term>
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Analyse de séquence de protéine</term>
<term>Motifs d'acides aminés</term>
<term>Motifs nucléotidiques</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in the literature. One such version is called the Planted Motif Search (PMS)or (<italic>l, d</italic>
)-motif Search. PMS is known to be NP complete. The time complexities of most of the planted motif search algorithms depend exponentially on the alphabet size. Recently a new version of the motif search problem has been introduced by Kuksa and Pavlovic. We call this version as the <italic>Motif Stems Search (MSS)</italic>
 problem. A motif stem is an <italic>l</italic>
-mer (for some relevant value of <italic>l</italic>
)with some wildcard characters and hence corresponds to a set of <italic>l</italic>
-mers (without wildcards), some of which are (<italic>l, d</italic>
)-motifs. Kuksa and Pavlovic have presented an efficient algorithm to find motif stems for inputs from large alphabets. Ideally, the number of stems output should be as small as possible since the stems form a superset of the motifs.</p>
</sec>
<sec><title>Results</title>
<p>In this paper we propose an efficient algorithm for MSS and evaluate it on both synthetic and real data. This evaluation reveals that our algorithm is much faster than Kuksa and Pavlovic’s algorithm.</p>
</sec>
<sec><title>Conclusions</title>
<p>Our MSS algorithm outperforms the algorithm of Kuksa and Pavlovic in terms of the run time as well as the number of stems output. Specifically, the stems output by our algorithm form a proper (and much smaller)subset of the stems output by Kuksa and Pavlovic’s algorithm.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Mi, T" uniqKey="Mi T">T Mi</name>
</author>
<author><name sortKey="Merlin, Jc" uniqKey="Merlin J">JC Merlin</name>
</author>
<author><name sortKey="Deverasetty, S" uniqKey="Deverasetty S">S Deverasetty</name>
</author>
<author><name sortKey="Gryk, Mr" uniqKey="Gryk M">MR Gryk</name>
</author>
<author><name sortKey="Bill, Tj" uniqKey="Bill T">TJ Bill</name>
</author>
<author><name sortKey="Brooks, Aw" uniqKey="Brooks A">AW Brooks</name>
</author>
<author><name sortKey="Lee, Ly" uniqKey="Lee L">LY Lee</name>
</author>
<author><name sortKey="Rathnayake, V" uniqKey="Rathnayake V">V Rathnayake</name>
</author>
<author><name sortKey="Ross, Ca" uniqKey="Ross C">CA Ross</name>
</author>
<author><name sortKey="Sargeant, Dp" uniqKey="Sargeant D">DP Sargeant</name>
</author>
<author><name sortKey="Strong, Cl" uniqKey="Strong C">CL Strong</name>
</author>
<author><name sortKey="Watts, P" uniqKey="Watts P">P Watts</name>
</author>
<author><name sortKey="Rajasekaran, S" uniqKey="Rajasekaran S">S Rajasekaran</name>
</author>
<author><name sortKey="Schiller, Mr" uniqKey="Schiller M">MR Schiller</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gould, Cm" uniqKey="Gould C">CM Gould</name>
</author>
<author><name sortKey="Diella, F" uniqKey="Diella F">F Diella</name>
</author>
<author><name sortKey="Via, A" uniqKey="Via A">A Via</name>
</author>
<author><name sortKey="Puntervoll, P" uniqKey="Puntervoll P">P Puntervoll</name>
</author>
<author><name sortKey="Gemund, C" uniqKey="Gemund C">C Gemund</name>
</author>
<author><name sortKey="Chabanis Davidson, S" uniqKey="Chabanis Davidson S">S Chabanis-Davidson</name>
</author>
<author><name sortKey="Michael, S" uniqKey="Michael S">S Michael</name>
</author>
<author><name sortKey="Sayadi, A" uniqKey="Sayadi A">A Sayadi</name>
</author>
<author><name sortKey="Bryne, Jc" uniqKey="Bryne J">JC Bryne</name>
</author>
<author><name sortKey="Chica, C" uniqKey="Chica C">C Chica</name>
</author>
<author><name sortKey="Seiler, M" uniqKey="Seiler M">M Seiler</name>
</author>
<author><name sortKey="Davey, Ne" uniqKey="Davey N">NE Davey</name>
</author>
<author><name sortKey="Haslam, Nj" uniqKey="Haslam N">NJ Haslam</name>
</author>
<author><name sortKey="Weatheritt, Rj" uniqKey="Weatheritt R">RJ Weatheritt</name>
</author>
<author><name sortKey="Budd, A" uniqKey="Budd A">A Budd</name>
</author>
<author><name sortKey="Hughes, T" uniqKey="Hughes T">T Hughes</name>
</author>
<author><name sortKey="Pas, J" uniqKey="Pas J">J Pas</name>
</author>
<author><name sortKey="Rychlewski, L" uniqKey="Rychlewski L">L Rychlewski</name>
</author>
<author><name sortKey="Trave, G" uniqKey="Trave G">G Trave</name>
</author>
<author><name sortKey="Aasland, R" uniqKey="Aasland R">R Aasland</name>
</author>
<author><name sortKey="Helmer Citterich, M" uniqKey="Helmer Citterich M">M Helmer-Citterich</name>
</author>
<author><name sortKey="Linding, R" uniqKey="Linding R">R Linding</name>
</author>
<author><name sortKey="Gibson, Tj" uniqKey="Gibson T">TJ Gibson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Obenauer, Jc" uniqKey="Obenauer J">JC Obenauer</name>
</author>
<author><name sortKey="Cantley, Lc" uniqKey="Cantley L">LC Cantley</name>
</author>
<author><name sortKey="Yaffe, Mb" uniqKey="Yaffe M">MB Yaffe</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author><name sortKey="Hoi Sze, S" uniqKey="Hoi Sze S">S hoi Sze</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Price, A" uniqKey="Price A">A Price</name>
</author>
<author><name sortKey="Ramabhadran, S" uniqKey="Ramabhadran S">S Ramabhadran</name>
</author>
<author><name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Buhler, J" uniqKey="Buhler J">J Buhler</name>
</author>
<author><name sortKey="Tompa, M" uniqKey="Tompa M">M Tompa</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author><name sortKey="Elkan, C" uniqKey="Elkan C">C Elkan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lawrence, Ce" uniqKey="Lawrence C">CE Lawrence</name>
</author>
<author><name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author><name sortKey="Boguski, Ms" uniqKey="Boguski M">MS Boguski</name>
</author>
<author><name sortKey="Liu, Js" uniqKey="Liu J">JS Liu</name>
</author>
<author><name sortKey="Neuwald, Af" uniqKey="Neuwald A">AF Neuwald</name>
</author>
<author><name sortKey="Wootton, Jc" uniqKey="Wootton J">JC Wootton</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rocke, E" uniqKey="Rocke E">E Rocke</name>
</author>
<author><name sortKey="Tompa, M" uniqKey="Tompa M">M Tompa</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Keich, U" uniqKey="Keich U">U Keich</name>
</author>
<author><name sortKey="Pevzner, P" uniqKey="Pevzner P">P Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Timothy, Lb" uniqKey="Timothy L">LB Timothy</name>
</author>
<author><name sortKey="Elkan, C" uniqKey="Elkan C">C Elkan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hertz, Gz" uniqKey="Hertz G">GZ Hertz</name>
</author>
<author><name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Eskin, E" uniqKey="Eskin E">E Eskin</name>
</author>
<author><name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pisanti, N" uniqKey="Pisanti N">N Pisanti</name>
</author>
<author><name sortKey="Carvalho, A" uniqKey="Carvalho A">A Carvalho</name>
</author>
<author><name sortKey="Marsan, L" uniqKey="Marsan L">L Marsan</name>
</author>
<author><name sortKey="Sagot, Mf" uniqKey="Sagot M">MF Sagot</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rajasekaran, S" uniqKey="Rajasekaran S">S Rajasekaran</name>
</author>
<author><name sortKey="Balla, S" uniqKey="Balla S">S Balla</name>
</author>
<author><name sortKey="Hsi Huang, C" uniqKey="Hsi Huang C">C hsi Huang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chin, Fyl" uniqKey="Chin F">FYL Chin</name>
</author>
<author><name sortKey="Leung, Hcm" uniqKey="Leung H">HCM Leung</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Davila, J" uniqKey="Davila J">J Davila</name>
</author>
<author><name sortKey="Balla, S" uniqKey="Balla S">S Balla</name>
</author>
<author><name sortKey="Rajasekaran, S" uniqKey="Rajasekaran S">S Rajasekaran</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rajasekaran, S" uniqKey="Rajasekaran S">S Rajasekaran</name>
</author>
<author><name sortKey="Dinh, H" uniqKey="Dinh H">H Dinh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dinh, H" uniqKey="Dinh H">H Dinh</name>
</author>
<author><name sortKey="Rajasekaran, S" uniqKey="Rajasekaran S">S Rajasekaran</name>
</author>
<author><name sortKey="Kundeti, V" uniqKey="Kundeti V">V Kundeti</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bandyopadhyay, S" uniqKey="Bandyopadhyay S">S Bandyopadhyay</name>
</author>
<author><name sortKey="Sahni, S" uniqKey="Sahni S">S Sahni</name>
</author>
<author><name sortKey="Rajasekaran, S" uniqKey="Rajasekaran S">S Rajasekaran</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kuksa, P" uniqKey="Kuksa P">P Kuksa</name>
</author>
<author><name sortKey="Pavlovic, V" uniqKey="Pavlovic V">V Pavlovic</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Davila, J" uniqKey="Davila J">J Davila</name>
</author>
<author><name sortKey="Balla, S" uniqKey="Balla S">S Balla</name>
</author>
<author><name sortKey="Rajasekaran, S" uniqKey="Rajasekaran S">S Rajasekaran</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Connecticut</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Connecticut"><name sortKey="Mi, Tian" sort="Mi, Tian" uniqKey="Mi T" first="Tian" last="Mi">Tian Mi</name>
</region>
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002006 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002006 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:3679804
   |texte=   Efficient algorithms for biological stems search
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:23679045" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

Serveur d'exploration MERS

Efficient algorithms for biological stems search

Efficient algorithms for biological stems search

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.